-
-
Notifications
You must be signed in to change notification settings - Fork 11
Add handling for carriage returns and backspaces #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for opening this! I will test locally, but I think we should also add tests mirroring the tests in JupyterLab. Let me know if you need help with that. Hi @echarles @eleonorecharles can you approve the CI? It looks like my rights were downgraded down to "triage" rather than to "maintain" so I can no longer help with that. |
@krassowski Just bumped back to maintain. I think that was changed as part of our ISO/SOC certifications initiatives, sorry for that. |
@krassowski the tests would be really helpful, can you point me to some docs or the right files to add them? I'd love to implement them. |
It think we could create a new |
Thank you, I'll get on that in the coming few days once I'm free. |
@aftoul Apart from the test, is there anything else remaining before merging? |
I'm testing this now in a notebook with some logging added... --- jupyter_server_nbmodel/actions.py
+++ jupyter_server_nbmodel/actions.py
@@ -162,28 +162,32 @@ def _output_hook(outputs: list[NotebookNode], ycell: y.Map | None, msg: dict) ->
if text.endswith((os.linesep, "\n")):
text = text[:-1]
if (not cell_outputs) or (cell_outputs[-1]["name"] != output["name"]):
output["text"] = [handle_carriage_return(handle_backspace(text))]
+ get_logger().info("??? text=%r | backspace => %r | cr => %r", text, handle_backspace(text), output["text"])
cell_outputs.append(output)
else:
last_output = cell_outputs[-1]
old_text = last_output["text"][-1] if len(last_output["text"]) > 0 else ""
combined_text = old_text + text
if '\r' in combined_text or '\b' in combined_text:
if combined_text[-1] == '\r':
suffix = '\r'
combined_text = combined_text[:-1]
else:
suffix = ''
new_text = handle_carriage_return(handle_backspace(combined_text)) + suffix
+ get_logger().info("??? (combined_text=%r | backspace => %r | cr => %r) + suffix %r",
+ combined_text, handle_backspace(combined_text), handle_carriage_return(handle_backspace(combined_text)), suffix)
last_output["text"][-1] = new_text
else:
last_output["text"].append(text)
cell_outputs[-1] = last_output
+ get_logger().warning("@@@ text=%r, cell_outputs=%s", output["text"], cell_outputs)
I feel there may be a second issue here — you're now handling \b \r, but there was also an issue of starting new lines when the sent data didn't contain \n. |
old_text = last_output["text"][-1] if len(last_output["text"]) > 0 else "" | ||
combined_text = old_text + text | ||
if '\r' in combined_text or '\b' in combined_text: | ||
if combined_text[-1] == '\r': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange to me that this needs to be special-cased here. I hope it can be folded into the handle_...() functions.
Why do I say that? A terminal has a state machine processing incoming characters, but never cares whether they arrived in one buffer vs. char-by-char, and a char's handling is independent of being first/middle/last in some buffer.
[in practice, there may be some timeouts for incomplete escape sequences? I know on input emacs distinguishes ESC press from prefix by timeout, but don't think terminals do this? Anyway \r and \b are single-char "atomic" sequences, so that's irrelevant.]
It's possible that modeling the state as old_text
fed back into the functions same as arriving characters is insufficient. 🤔
block-beta
columns 6
space space state[("old_text")] space space kernel
space space space space space space
handle_cr space handle_b space space space
kernel-- "+ incoming text" -->state
state --> handle_b
handle_b --> handle_cr
handle_cr --> state
More formally, these functions could take 2 args (state, incoming text) and reduce it to updated state? Not sure whether that matters here.
- For starters, cursor position is now modeled by cutting off/replacing some trailing characters but perhaps is more natural to model state as (text, cursor) pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the special case for when incoming text contains \r at the end to ensure the text is shown. Otherwise, the text may be discarded too early and never be shown. So yeah, I guess that is a weakness of the old_text
/new_text
approach.
The other issue that you pointed out is due to appending incoming text as new lines when neither the old text nor the new one contain a special character. I don't remember if that was part of the original behavior, but what would be the ideal way to handle that here? I'm thinking of concatenating the new output to the end of the last cell until \n is encountered or doing away with the lists and replacing them with a (text, cursor) pair as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, the text may be discarded too early and never be shown.
Ah yes that's where it gets interesting 🤔
In an ideal world, now that we've had glass TTYs for decades, there'd be some migration path to simpler semantics for apps, where \b would actually eat chars and \r would clear out everything since the last \n.
-
But what \r does in actual terminals, as well as existing Jupyter client-side output processing, is keep text in place, just move cursor to start of line. E.g. '11111\r 2' results in '2 111' (with cursor at the first 1) — if new text is shorter, remains of previous text are still visible.
This is rarely used; it's more of a nuisance that after \r apps may need to right-pad the next line with spaces to ensure previous gets over-written. But it does give flexibility, as you say, that printing '...\r' each time is near-equivalent to printing '\r...' each time (only difference is where cursor stays most of the time).
Anyway, I don't suppose a plugin shifting execution to server is the place to deviate from that behavior. -
Moreover, actual terminals [soft-]wrap long lines and \r only returns to start of current line; at least Jupyter avoids that complication (behaves closer to infinite-width terminal, with horizontal scrolling if necessary).
But your suffix
technique is interesting, I only now fully grokked it.
The Y model has no concept of representing or dislaying a "cursor", right?
By sometimes keeping an \r
unprocessed — and storing it as such in the Y doc¹ too — it can represent states where a line is not fully overwritten yet 👍.
(¹This falls back on client-side to hide text per \r \b. Funny but OK imho.)
-
So perhaps even the handle_*() functions themselves could adopt this approach to represent all partially-overwritten states? E.g. now, if \r is in middle and doesn't hit your special code path:
print('11111\r2 ', end='') time.sleep(1) print('b\b3', end='') # \b simply to avoid `last_output["text"].append(text)` branch
results in
2 1113
, note how3
got wrongly tacked at the end even though logical "cursor" should have stayed after the2
. If the intermediate state was kept in11111\r2
form, the "cursor position" could be retained. -
With my previous suggestion of (text, cursor) pair, the cursor position can't be persisted into Y doc, only kept in-memory in server.
However, it's not like we have any way to continue an in-progress execution when server dies, so I guess that's fine too.
@aftoul Did you make any progress on tests, or do you want me to write some? I feel it's not enough to unit-test handle_backspace() & handle_carriage_return(), we also better test some of the logic now living in _output_hook(). |
@cben I started working on the tests but life happened. I'll go back to them now that I have a bit less work |
if text.endswith((os.linesep, "\n")): | ||
text = text[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- BTW, this was an approximation, it loses data. Regular
foo\n
should not be treated same asfoo
. - And it'd be nice to tie the array structure of
"text": [...]
to newlines?
In regular Jupyterlab, here is what I see:
and here is the .ipynb representation:
> jq .cells[].outputs --compact-output Untitled.ipynb
[{"name":"stdout","output_type":"stream","text":["no trailing"]}]
[{"name":"stdout","output_type":"stream","text":["1 trailing\n"]}]
[{"name":"stdout","output_type":"stream","text":["2 trailing\n","\n"]}]
[{"name":"stdout","output_type":"stream","text":["foobar\n","baz"]}]
[]
=> in doc model, "text" array tends to start new item after every newline. 📜
https://nbformat.readthedocs.io/en/latest/format_description.html says it doesn't matter!
Some fields, such as code input and text output, are characteristically multi-line strings. When these fields are written to disk, they may be written as a list of strings, which should be joined with '' when reading back into memory. In programmatic APIs for working with notebooks (Python, Javascript), these are always re-joined into the original multi-line string. ...
However, keeping item-per-line is nice for .ipynb diff-ability.
=> in doc model, the last line does record whether it had a trailing newline. ✍️
=> the UI does not show whether the the last line had trailing newline 🙈
In a terminal, lack of trailing newline is evident by next prompt just appearing on same line e.g. no trailing>>>
but that'd be a sin against the structuredness of notebooks. And since 1 newline is the common case, UI didn't want to spend vertical space rendering it. (the first two cells have exactly same height)
UI does reflect >1 newlines at the end by showing blank lines.
But in theory, future/alternate UI might reflect trailing newline.
For comparison, here is the structure for same cells I'm getting with this branch:
[{"name":"stdout","output_type":"stream","text":["no trailing"]}]
[{"name":"stdout","output_type":"stream","text":["1 trailing"]}]
[{"name":"stdout","output_type":"stream","text":["2 trailing\n"]}]
[{"name":"stdout","output_type":"stream","text":["foo","bar\nbaz"]}]
- eats one newline off the 1 & 2 cases 🐞
- the array structure — while formally not important — reflects how it got streamed, less stable for diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ However, it appears the current Jupyterlab is not as lenient as the spec I cited, and does NOT concat multiple items without newlines into one line?
for i in range(30):
print(i, end=' ')
import time; time.sleep(1)
upstream results in json:
[{"name":"stdout","output_type":"stream","text":["0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 "]}]
which renders all on one line. With this extension (with or without this branch), I get:
[{"name":"stdout","output_type":"stream","text":["0 ","1 ","2 ","3 ","4 ","5 ","6 ","7 ","8 ","9 ","10 ","11 ","12 ","13 ","14 ","15 ","16 ","17 ","18 ","19 ","20 ","21 ","22 ","23 ","24 ","25 ","26 ","27 ","28 ","29 "]}]
which is supposed to be equivalent yet shows up in browser as each number on a separate line!
Highlighting #48 as of double relevance here:
|
Handle cell outputs containing
\r
and\b
. This PR fixes the issues such as #39, encountered when using TQDM and other similar packages that use special characters to format their output.